Scan to follow along on your phone or tablet
"From Manus to Claude Code, AI agents are making their way into the everyday life of everyone of us."
In this talk, we'll answer that question by going through the anatomy of one. We'll explore:
Every LLM call in the agent uses structured generation: the model must respond in a predefined JSON schema, making outputs machine-readable and reliable.
These four operations, driven by structured outputs, are the base for the agent loop.
The agent loop is a LlamaIndex Agent Workflow — an event-driven, stepwise execution engine that connects the LLM to the external world.
After each Observe, the loop restarts from Think — until the LLM decides all tasks and sub-tasks are done and produces a Stop event.
The LLM loop is a generalist architecture. What defines a document processing agent are the interfaces through which it interacts with the external environment.
Even if the agent is tricked into writing malicious files, the real machine filesystem remains completely unaffected as the virtual FS absorbs the damage, which does not permeate the real machine unless you sync it with the virtual one.
Beyond plain-text file ops, the document agent uses LlamaParse tools that genuinely understand unstructured content (PDFs, Word docs, PowerPoint, Excel and more).
Document workflows can take minutes to half an hour. The agent works in the background and pings you when finished: just like a colleague, not a spinner :)
This is not a 100% guarantee. Prompt injection can still exfiltrate data from documents the agent has access to. Always monitor what the agent sees and how it behaves.